Week 5.1 - The AI Literature Review Landscape

What We'll Cover

Literature review is arguably one of the most immediately useful applications of generative AI for researchers at every career stage. Whether you are a first-year Master's student trying to get your bearings in a new field or a seasoned academic keeping up with a fast-moving literature, AI tools can dramatically accelerate the process of finding, reading, and synthesising published work.

In this session, we map the landscape: what types of AI literature tools exist, what they actually do under the hood, and how to think about combining them into an effective workflow. We will also confront a critical limitation head-on — AI can find and summarise papers quickly, but it can also confidently cite papers that do not exist. Understanding why this happens (and how to protect yourself) is a theme we will return to throughout this week.

By the end of this sub-lesson, you should be able to identify the major categories of AI literature tools, understand the mechanisms that power them, and begin thinking about which combination might work best for your own research.

🗺️ Why AI Changes Literature Review

Before diving into specific tools, it helps to understand what makes literature review so difficult in the first place — and how AI addresses each of those challenges. Traditional literature review has several well-known pain points, each of which AI tools tackle in different ways.

⏰ Time and Scale

The problem: Academic publishing has grown exponentially. Biomedical research alone adds over 1 million papers per year to PubMed. No researcher can manually screen everything relevant to their topic.

How AI helps: Automated screening, summarisation, and data extraction let you process hundreds of abstracts in the time it would take to read ten. Tools like Elicit can extract structured data from dozens of papers simultaneously.

🔑 Keyword Dependency

The problem: Traditional database searches rely on exact keyword matching. Different fields use different terminology for the same concept. A search for "machine learning" might miss papers that only use "statistical learning" or "pattern recognition."

How AI helps: Semantic search tools understand meaning, not just keywords. You can describe your research question in plain language, and the tool finds conceptually related papers regardless of the specific terminology they use.

🌐 Language Barriers

The problem: Significant research is published in languages other than English. Valuable work in Mandarin, Spanish, German, or Portuguese journals is routinely overlooked by English-speaking researchers. The problem is even worse in lower resource languages, and often AI does not help here.

How AI helps: Large language models can read and summarise papers in dozens of languages. Some tools can translate abstracts or even full papers on the fly, opening up previously inaccessible literatures.

📈 Keeping Current

The problem: Even after completing a thorough review, the literature keeps growing. Staying up to date requires continuous monitoring across multiple journals and preprint servers.

How AI helps: Tools like ResearchRabbit can send automated alerts when new papers appear that are related to your collection. Semantic Scholar offers AI-powered recommendations based on your reading history and library.

🔶 Coverage Gaps

The problem: Keyword searches inevitably miss papers. You find what you know to look for, but the most important papers are sometimes the ones you did not know existed — especially papers in adjacent fields that use different vocabulary.

How AI helps: Citation-mapping tools like Connected Papers and Litmaps build visual networks showing how papers relate to each other through citations. These tools reveal clusters of related work that no keyword search would have found.

📋 Synthesis Difficulty

The problem: Finding papers is only half the battle. Synthesising findings across dozens of studies — comparing methods, reconciling conflicting results, identifying gaps — is cognitively demanding and prone to bias.

How AI helps: Grounded chat tools let you upload a collection of papers and ask questions across all of them. You can ask "What methods did these studies use?" or "Where do these papers disagree?" and get an answer that draws from your actual sources.

⚠️ A critical caveat: AI tools address these pain points, but they do not eliminate them. Every AI literature tool has significant limitations — from incomplete coverage of databases to the persistent risk of hallucinated references. Think of these tools as powerful assistants, not replacements for your own critical judgment. We will explore specific failure modes in later sessions this week.

🧩 Three Categories of AI Literature Tools

The landscape of AI-powered literature tools can feel overwhelming. New products launch every month, and many make similar-sounding claims. To cut through the noise, it helps to group these tools into three broad categories based on how they fundamentally work. Each category has distinct strengths and weaknesses.

🔗 1. Citation-Based Discovery Tools

Examples: Connected Papers, ResearchRabbit, Litmaps, CoCites

These tools work by analysing citation chains and co-citation networks. Given a "seed" paper, they map out which other papers cite it, which papers it cites, and which papers frequently appear alongside it in reference lists. The result is a visual network of related work.

Key strength: They find papers you would never discover through keyword search alone — papers that are conceptually related because the research community treats them as related (by citing them together). This is fundamentally different from text-based search.

Important distinction: These tools typically rely on bibliometric data, not AI language models. Connected Papers uses co-citation analysis algorithms, not GPT. This means they do not hallucinate papers — every paper they show you is real and genuinely connected through the citation network.

🔬 2. Semantic Search & Synthesis Tools

Examples: Elicit, Consensus, Semantic Scholar, SciSpace

These tools use AI language models to understand the meaning of your research question, not just the keywords. When you type a question like "Does mindfulness meditation reduce cortisol levels?", a semantic search tool interprets that question and finds papers whose content is conceptually relevant — even if they never use the word "mindfulness."

Key strength: They can extract structured data from papers (sample sizes, methods, findings), summarise results across multiple studies, and even answer specific research questions grounded in the literature. Elicit, for example, can create a table comparing the methods and findings of 20 papers in minutes.

Key risk: Because these tools use language models, they can sometimes generate inaccurate summaries or misrepresent a paper's findings. The quality of the underlying model matters enormously. Always verify key claims against the original paper.

💬 3. Grounded Chat & RAG Tools

Examples: NotebookLM (Google), Claude with uploaded papers, Perplexity, ChatPDF

These tools let you upload your own documents and have a conversation with them. They use a technique called Retrieval-Augmented Generation (RAG): when you ask a question, the tool searches through your uploaded documents for relevant passages, then uses a language model to compose an answer based on those specific passages.

Key strength: Responses are grounded in your actual sources. Rather than drawing on the model's general training data (which may contain errors), the model works directly with the text you provide. This significantly reduces hallucination risk, though it does not eliminate it entirely.

Key risk: The quality of the output depends on the quality and completeness of the documents you upload. If a crucial paper is missing from your collection, the tool cannot account for it. RAG tools can also occasionally misinterpret the source text, especially with complex statistical results or nuanced arguments.

💡 A note on overlap: Many tools increasingly span multiple categories. Semantic Scholar, for instance, offers both semantic search and citation graph exploration. Elicit combines semantic search with data extraction that uses RAG-like techniques. The categories above describe the primary mechanism each tool relies on, but the boundaries are blurring as the field evolves.

⚙️ What's Actually Happening Under the Hood

You do not need to be a machine learning engineer to use these tools effectively, but a conceptual understanding of how they work will help you predict when they will succeed and when they will fail. Here is a simplified look at the mechanisms powering each category.

🔗 Citation-Based Tools: Graph Analysis

These tools operate on large citation databases. Semantic Scholar, for example, maintains a corpus of over 220 million academic papers with their citation relationships. When you provide a seed paper, the tool builds a graph where each node is a paper and each edge is a citation link.

The tool then uses graph analysis algorithms to find papers that are most strongly connected to your seed — not just papers that directly cite it, but papers that share many of the same references (co-citation analysis) or are frequently cited by the same later papers (bibliographic coupling). The result is a map of the "neighbourhood" of your research topic as defined by the research community's own citation behaviour.

When this works well: In established fields with mature citation networks. When this struggles: For very new topics (preprints haven't been cited yet), interdisciplinary work (citations may not cross field boundaries), or topics where key papers are in databases the tool doesn't index.

🔬 Semantic Search Tools: Embeddings and Vector Similarity

Semantic search tools use embedding models — neural networks that convert text into high-dimensional numerical vectors (lists of numbers). The key property of these vectors is that texts with similar meanings end up with similar vectors, regardless of the specific words used.

When you type a research question, it gets converted into a vector. The tool then compares this vector against pre-computed vectors for millions of paper abstracts, finding the papers whose vectors are closest to yours in this mathematical space. This is why you can search for "Does exercise help depression?" and find papers titled "Physical activity as an intervention for major depressive disorder" — the meanings are close even though the words differ.

When this works well: When your question is clear and the relevant literature uses broadly similar concepts. When this struggles: With highly technical or specialised queries where domain-specific nuance matters, or when the same term means different things in different fields (e.g., "bias" in statistics vs. psychology vs. electronics).

💬 RAG Tools: Chunking, Retrieval, and Generation

Retrieval-Augmented Generation (RAG) is a multi-step process:

Chunking: Your uploaded documents are split into smaller segments (typically a few hundred words each).
Embedding: Each chunk is converted into a numerical vector, just like in semantic search.
Retrieval: When you ask a question, it is also converted into a vector. The system finds the chunks whose vectors are most similar to your question.
Generation: The retrieved chunks are provided to the language model as context, along with your question. The model generates an answer based specifically on this context.

When this works well: When your question can be answered by a specific passage in one of your documents. When this struggles: When the answer requires synthesising information spread across many different sections, when the chunking splits a key argument across two chunks, or when the question requires reasoning that goes beyond what is explicitly stated in the text.

🧠 Why this matters for you: Understanding these mechanisms helps you become a more strategic user. If you know that citation tools rely on graph data, you will not expect them to find a preprint published yesterday. If you know that semantic search depends on embeddings, you will rephrase a failed query rather than just adding more keywords. If you know that RAG retrieves chunks, you will understand why it sometimes gives incomplete answers to broad questions — and know to break your broad question into smaller, more specific ones.

🔄 Building a Combined Workflow

No single AI tool does everything well. Citation tools do not summarise. Semantic search tools do not let you have a conversation with your papers. RAG tools only know about the documents you upload. The most effective approach is to combine multiple tools strategically, using each where it is strongest.

Here is a recommended workflow that many researchers are finding effective. Think of it as a starting template that you can adapt to your own field and research style.

Step	Action	Tool Category	Example Tools
1. Landscape Scan	Start broad — search with your research question in plain language to understand what is out there and identify the main themes.	Semantic Search	Elicit, Consensus, Semantic Scholar
2. Identify Seeds	From your initial scan, identify 3–5 key papers that are central to your topic — highly cited, recent reviews, or foundational studies.	Your judgment	(Manual selection)
3. Map the Network	Feed your seed papers into citation-mapping tools to discover related work you would not have found through keyword search.	Citation-Based Discovery	Connected Papers, ResearchRabbit, Litmaps
4. Deep Reading & Synthesis	Read through the papers first yourself, identifying things that you don't understand. Upload your most important papers to a grounded chat tool. Ask specific questions across them: methods used, key findings, points of disagreement.	Grounded Chat / RAG	NotebookLM, Claude, Perplexity
5. Verify & Validate	Cross-check every AI-surfaced paper against traditional databases. Confirm that cited papers actually exist and say what the AI claims they say.	Traditional Databases	Google Scholar, Web of Science, Scopus
6. Manage & Organise	Use a reference manager to store and organise everything. Zotero is free, open source, and increasingly has AI plugins for tagging and annotation.	Reference Management	Zotero + AI plugins, Paperpile
7. Take detailed notes and synthesise	You have to actively engage at every step along the way, and the step of actually collating what you have found into a form that makes sense to you is vital! If you don't do this, then you may have just offloaded the cognitively demanding tasks.	Notetaker of your choice	Obsidian, Google docs, pen and paper

🔑 Key insight: The most effective researchers don't rely on a single AI tool — they combine multiple tools strategically, using each where it is strongest, and always verifying against traditional databases. No AI tool currently has complete coverage of all published work, and no AI summary should be trusted without verification against the original source.

⚠️ Step 5 is non-negotiable. It is tempting to skip verification when AI tools give you fluent, confident-sounding results. But AI language models can and do fabricate references — inventing plausible-sounding authors, journal names, and even DOIs. We will examine real examples of this failure mode in later sessions. For now, take this as a firm rule: every reference that enters your work must be independently verified.

📚 Readings for This Week

The following readings provide the evidence base and critical perspectives that underpin this week's sessions. The first three are core readings; the supplementary items are recommended for those who want to go deeper.

Core Readings

📄 Khraisha, Q., et al. (2024). "Can Large Language Models Replace Systematic Reviews?"

Research Synthesis Methods. Preprint available.

A careful evaluation of whether LLMs can perform the core tasks of systematic review — screening, extraction, synthesis — to the standard required. The answer is nuanced and instructive. Read at least the introduction and conclusion; this is both sobering and informative about the current state of AI capabilities in this domain.

📄 van Dis, E.A.M., et al. (2023). "ChatGPT: Five Priorities for Research."

Nature, 614. Free to read. https://doi.org/10.1038/d41586-023-00288-7

An early and influential piece outlining the key priorities the research community needs to address as AI tools become embedded in scientific practice. Published in Nature and widely cited, it frames many of the debates that are still ongoing. Short and accessible.

📄 Wagner, G., et al. (2022). "Artificial Intelligence and the Conduct of Literature Reviews."

Journal of Information Technology, 37(2). Open access.

A comprehensive overview of how AI was being applied to literature review processes before the current generative AI wave. Provides valuable context for understanding what has changed and what challenges remain. The framework they propose for thinking about AI-assisted review stages is still useful.

Supplementary Readings

Elicit Team.

Elicit blog. Free. A clear explanation from the team behind one of the most popular AI research tools about what their system actually does, including its limitations. Useful for understanding the semantic search category.

Syriani, E., et al. (2023). "Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews."

arXiv:2307.06464. Free. An empirical study testing ChatGPT's ability to perform title and abstract screening — one of the most time-consuming steps in systematic review. Useful data on accuracy, sensitivity, and specificity.

Mollick, E. .

One Useful Thing (Substack). Free. Ethan Mollick consistently provides some of the most balanced, practical writing on AI in academic contexts. This piece explores how AI is changing the research paper as a format.

Perplexity AI. (2024). "How does Perplexity work?"

Free online. Understanding how Perplexity's citation system works (and its limitations) is useful for anyone considering it as a research tool. Illustrates the RAG approach in practice.

Retraction Watch. (ongoing). "Did a prof invent his own Nobel Prize?"

Free online. A sobering and continuously updated resource documenting cases where AI-generated content has caused problems in published research. Essential context for understanding why verification matters.

Key Takeaways

AI literature tools fall into three main categories: citation-based discovery (graph analysis), semantic search and synthesis (embeddings and language models), and grounded chat / RAG (conversation with your documents).
Each category works through a fundamentally different mechanism, which determines when it will succeed and when it will fail. Understanding the mechanism helps you use the tool more effectively.
No single tool covers the full literature review workflow. The most effective approach combines multiple tools, using each where it is strongest.
Verification against traditional databases and original sources is non-negotiable. AI tools can and do fabricate references, misrepresent findings, and miss important papers.
The field is evolving rapidly. Tools that exist today may be superseded within months. The conceptual framework — understanding what types of tools exist and why — will remain useful even as specific products change.

👉 Up next: In the next session, we take a deep dive into the free tools available — Semantic Scholar, Connected Papers, ResearchRabbit, NotebookLM, and Google Scholar — with hands-on exploration of what each does best and where each falls short.